Neural decision boundaries for maximal information transmission 
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We consider here how to separate multidimensional signals into two categories, such that the 
binary decision transmits the maximum possible information transmitted about those signals. Our 
motivation comes from the nervous system, where neurons process multidimensional signals into a 
binary sequence of responses (spikes). In a small noise limit, we derive a general equation for the 
decision boundary that locally relates its curvature to the probability distribution of inputs. We 
show that for Gaussian inputs the optimal boundaries are planar, but for non-Gaussian inputs the 
curvature is nonzero. As an example, we consider exponentially distributed inputs, which are known 
to approximate a variety of signals from natural environment. 
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What we know about the world around us is repre- 
sented in the nervous system by sequences of discrete 
electrical pulses termed action potentials or "spikes" d. 
One attractive theoretical idea, going back to the 1950s, 
is that these representations constructed by the brain 
are efficient in the sense of information theory [E IE II - 
These ideas have been formalized to predict the spatial 
and temporal filtering properties of neurons [E [E 0,11, Q , 
as well as the shapes of nonlinear input/output relations 
(lo| , showing how these measured behaviors of cells can 
be understood as optimally matched to the statistical 
properties of natural sensory inputs. There have been 
attempts, particularly in the auditory system, to test di- 
rectly the prediction that the coding of naturalistic in- 
El EE EE Ei III , and this concept 



puts is more efficient 



of matching has been used also to predict new forms of 
adaptation to the input statistics El, EE El, EE EE El • 
Despite this progress, relatively little attention has been 
given to the problem of optimal coding in the presence of 
the strong, threshold-like nonlinearities associated with 
the generation of spikes 

Sensory inputs to the brain are intrinsically high di- 
mensional objects. For example, visual neurons encode 
various patterns of light intensities that, upon moderate 
discretization, become vectors in 10 2 — 10 3 dimensional 
space 

HE El- We can think of the "decision" to generate 
an action potential as drawing boundaries in these high 
dimensional spaces, so that a theory of optimal coding for 
spiking neurons is really a theory for the shape of these 
boundaries. In the simplest perceptron-like models [25j |. 
boundaries are planar, and spiking thus is determined 
by only a single (Euclidean) projection of the stimulus 
onto a vector normal to the dividing plane. In the per- 
ceptron limit, the optimal choice of decision boundaries 
reduces to the choice of an optimal linear filter. But a 
number of recent experiments suggest that neurons, even 
in early stages of sensory processing, are sensitive to mul- 



tiple stimulus projections, with intrinsically curved deci- 
sion boundaries EE El EE EE El, El, El - Here we try 
to develop a theory of optimal coding for spiking neurons 
in which these curved boundaries emerge naturally. 

We consider a much simplified version of the full prob- 
lem. We look at a single neuron, and focus on a small 
window of time in which that cell either does or does 
not generate an action potential. We ignore, in this 
first attempt, coding strategies that involve patterns of 
spikes across multiple neurons or across time in single 
neurons, and ask simply how much information the bi- 
nary spike/no spike decision conveys about the input sig- 
nal. Let this input signal be a vector r in a space of d 
dimensions [311 ] and let the distribution of these signals 
be given by P(r). If the binary output of the neuron is a, 
we are interested in calculating the mutual information 
I (a; r) between a and the input r. 

We can write the information as a difference between 
two entropies [H, EH , the response entropy and the noise 
entropy: I(<r;r) = H TesponBe - H noisc . In our simplified 
problem, with a single neuron giving binary responses, 
the response entropy, 



response 



= -plogp- (1 -p) log(l -p), 



(1) 



is completely determined by the average spike probabil- 
ity p. We might imagine that this probability is set by 
constraints outside the problem of coding itself. For ex- 
ample, generating spikes costs energy, and so metabolic 
constraints might fix the mean spike rate 33, 34, 35, 3tSj |. 
Our problem, then, is to find coding strategies that min- 
imize the noise entropy at fixed p. 

In the absence of noise, the coding scheme which maps 
signals into spikes (or not) is a boundary in the d— 
dimensional space of inputs. If the domain in which 
spikes occur is G, then p = J G d d rP(r). If the noise truly 
were zero, all codes with the same value of p would trans- 
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mit the same amount of information, and there would be 
an infinite set of nominally optimal domains G. 

We will work in an approximation where the noise is 
small and additive with the input. Then if the boundary 
of the spiking domain is some (d — 1 dimensional) sur- 
face 7, we expect that responses far from this boundary 
are essentially deterministic and do not contribute to the 
noise entropy; all of the contributions to H nomc should 
arise from a narrow strip surrounding the boundary 7. 
Within this strip, the response is almost completely un- 
certain. Thus we can approximate the noise entropy by 
saying that is ~ 1 bit inside the strip, and zero outside; 
the total noise entropy is then the mass of probability 
inside the strip. The width of the strip is proportional to 
the strength of the noise, and if noise is small the prob- 
ability distribution of inputs does not vary significantly 
across this width, so we can write the overall noise en- 
tropy as an integral along the decision boundary 7: 




FIG. 1: Comparison of noise entropies for straight line solu- 
tions (1) and circles with spiking on the outside (2) or inside 
(2') for Gaussian inputs. The entropy for circular solution 
depends on the dimensionality d of inputs as illustrated here 
in the case of spiking outside of a circle. 



#noise [7; P(r)] « tr / dsP(r) 



(2) 



where ds is the infinitesimal surface element of dimension 
d—1 on the decision boundary 7 and a is the amplitude of 
the noise. The exact shape of the nonlinear function de- 
scribing how spike probability changes across the domain 
boundary might introduce additional numerical factor of 
order unity in Eq (jSJ), but these can always be incorpo- 
rated into defining a as the effective noise level. 

While our choice of threshold-like transitions between 
spiking and non-spiking regions considerably narrows 
the types of possible input-output transformations, it 
still leads, as we show below, to highly nontrivial, yet 
tractable, solutions. We will treat the noise length scale 
a as one of the pre-defined parameters; it can take arbi- 
trary positive values and will set the units for measuring 
contours' curvature and VlnP(r). 

Taking into account that the response entropy 
-^response only depends on the average spike probability, 
the optimal contour providing maximal information may 
be found by minimizing 



F = a I dsP{r) - X p — 

J •*v J G 



d d rP(r) 



(3) 



where A is the Lagrange multiplier incorporating the con- 
straint for the average spike probability p. For an optimal 
contour, first order variation of F respect to local pertur- 
bations <5r in the contour's shape should be zero. Only 
perturbations along the surface normal could change the 
value of the functional. Both the infinitesimal surface 
element and the probability values along the decision 
boundary are subject to change: 



5F 



ds5r±(s) 



oti • ^P(r) + ah ■ VP + AP(r) 
ds[ 



where the set of vectors {ti, t2,- • • , t^-i} defines the 
tangent plane, n defines the normal, and we use the sum- 
mation convention for the index i. Because perturbations 
at various points along the surface are independent, the 
optimal contour should satisfy: 



A + K + n-VlnP = 0, 
k = tj • dh/dsi — divn, 



(5) 
(6) 



where n is the mean curvature of the decision boundary 
7. Notice that we have rescaled A by a factor of er, so 
there is only one parameter in the problem. 

Below we solve Eq ([5]) to find optimal decision bound- 
aries for two example probability distributions: a Gaus- 
sian and the exponential. The exponential distribution 
is important not only as an example of non-Gaussian in- 
puts, but also because it captures some of the essential 



statistical properties found in real-world signals [37|, [3 



Consider the case of uncorrelated Gaussian inputs 
P(r) = (27r)~ d / 2 exp(— r 2 /2), where the equation on op- 
timal contours takes the form: 



X + k — n ■ r = 0. 



(7) 



The families of possible solution include circles [A = r — 
(d — l)/r, where r is the circle radius] and straight lines 
k = [A = r, where r is the smallest distance from the 
line to the origin]. Circles and straight lines turn out to 
be the only possible smooth contours [391 ]. 

To choose between circles and straight lines, we calcu- 
late the noise entropy as a function of spike probability 
p in both cases. From Eq @, we see that H no i se is pro- 
portional to the noise level a, so in what follows we com- 
pute the noise entropy in these units. For straight lines 



a distance r from the origin, Pqi no = exp(— r /2)/v2tt 
and Pine = [f — erf(r/\/2)] /2. For a circle in two di- 
mensions Pcircio = rexp(-r 2 /2) and P ou tsidc- a- circle = 
(4) exp(— r 2 /2). Expressions for entropy and probability for 
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straight lines do not change with dimensionality d, while 
the corresponding values for circles are: 



H 



P. 



(d) 



outside— a— circle 



2r d-l e -r 2 /2 

2 d / 2 r(|) : 



(8) 
(9) 



where T(n, x) — c?ie~*i" _1 is the incomplete Gamma 
function. In Fig. Q]we plot these solutions to show that 
for any probability p and dimensionality d, the optimal 
separation is with straight boundaries. This result also 
holds for correlated Gaussian inputs, where the optimal 
hyperplane is the one which intersects the axis of largest 
variance and is parallel to other coordinate axes. 

As an example of a non-Gaussian probability distribu- 
tion, we consider an exponential distribution in two di- 
mensions (2D): P(x,y) = ie~' x '~' y '. The local equation 
for optimal contours (0 can be written parametrically: 



ds 
dx 

ds 



A + sin <f> — cos <j>, x>0,y>0 (10) 

a d y -j. 

cos 0, — — = sin tp 
ds 



where angle <f> determines the tangent t = (cos <ft, sin 0) 
and normal n = (— sin </>, cos <j>) of the curve, as well as 
the curvature k = —d<p/ds. Solutions in other quadrants 
can be obtained from Eq (|10p by an appropriate change 
of variables. 

For A = ±1, the family of optimal contours includes 
straight lines parallel to coordinate axes. Such straight 
lines represent ID threshold decisions, and in this case 
the noise entropy equals the spike probability, decreasing 
exponentially with the threshold r for decision x > r: 



independent 



Pi 



independent 



72. 



(ii) 



The only other straight line solution that satisfies the 
optimality condition in Eq (|10[) is a line y = ±x; it cor- 
responds to spike probability p = 1/2. Straight lines 
of the same angle that do not pass through the origin 
do not satisfy the optimality condition, but they pro- 
vide a useful benchmark for other solutions in the mid- 
dle range of probabilities 0.2 < p < 0.8, where they 
are better than the straight lines parallel to the axes: 
H n/4 = y/2(r + 1) exp(-r)/4, P w/4 = (r + 2) exp(-r)/4, 
as Fig. [3] illustrates. 

Within a single quadrant, the optimal solution can be 
found explicitly in terms of angle <f> relative to the starting 
point where cf> = cj> , x a = x((f> ), and y = y((po)- 

, , s / , s i A + sin (f> — cos (j) 

x(9) + yw) = x o + yo + hi . — 

A + sin <po — cos <po 
y{4>) - x{(j)) = yo - x a + <f> - O - A [s{<p) - s((j> )] , (12) 

where arc length s(<f)) depends on the angle 4> as: 






FIG. 2: Optimal solutions for 2D exponential inputs: 
(A) closed "stretched circle" solutions are shown for A = 
—0.3, —0.9, —0.99, numbers correspond to the increasing size 
of the curved segment throughout this legend. (B) Ex- 
tended solutions symmetric around y — x line are shown for 
A = 0, —0.25, —0.5, —0.75. (C) Extended solutions symmet- 
ric around x = line are shown for A = 0, —0.5, —0.75 [this 
type turned out to be suboptimal, albeit by a small margin, 
compared to either A or B, cf. Fig. [3]. 
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|A| < V2 



W i 2 ^tan^ U (0), \\\>V2 ' [ ' 



u(0) = (1 + (A + 1) tan(0/2)) /V|A 2 - 2|. 

These solutions are similar to a logarithmic spiral for 
|A| > \2, and to a hyperbola for |A| < v2 f with asymp- 
totes at 7r/4 — arcsin(A/v / 2) and 57r/4 + arcsin(A/v^). 
Asymptotes themselves are valid solutions within a quad- 
rant; they will be part of a global solution. For all A the 
solution (fT5|) intersects coordinate axes where it should 
be matched with similar solutions in other quadrants. 

The possible types of global solutions are shown in 
Fig. [2] They could be either closed ( "stretched circles" ; 
A) or extended (B and C) 0. For |A| < y/2, extended 
solutions can be formed by connecting asymptotes in 
two separate quadrants with a convex curve described 
by Eq's (|12I13[) . We will refer to such extended solutions 
as B or C depending upon whether the curved segment 
passes through one or two quadrants, cf. Fig. [2j Ex- 
tended solutions B are symmetric around y — x line, and 
exist only for — 1 < A < 0, while extended solutions C are 
symmetric around x = line, and exist for —1 < A < 1. 

For all types of global solutions (A-C), boundary con- 
ditions specify a unique curve for each value of A. In all 
cases, both entropy and probability can be found exactly 
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FIG. 3: Noise entropy H along various decision boundaries 
with exponential inputs: "streched circles" with spiking out- 
side A or inside A', extended solutions B and C; straight lines 
parallel to coordinate axes (1) and at ±7r/4 angle (2). Solu- 
tions (A-C) and (1) satisfy the optimality Eq (|10|l . but not 
(2), which is optimal only at a single point at p — 1/2 where 
it becomes part of family of extended solutions B. Inset shows 
that switching occurs between solutions A and B. 



as a function of A. For solutions A we find 



Pa 



2- A(A + l)As A 
2^A2 
, (A + l)As A - 



-Ra 



2 — A 2 



-Ra 



(14) 
(15) 



with expressions for arc length Asa and size of the curved 
segment Ra from 4l|. For extended solutions B, the 
entropy and probability become [42| 



_ Rb 2{VQ~X i - A) - XAs B (A + V2~V) 
4(2- A 2 ) 



4-A 2 - AV2^A 2 "+As B (A + V2 _ ^A 2 ") _ R 
Pn = e 

B 4(2 - A 2 ) 



(16) 

Rb 

(17) 



More detailed calculations shows that solutions C are 
suboptimal compared to global solutions A or B; see 
Fig. and the discussion below. Note that neither A, 
B, nor C solutions exist for A < —1. 

The most physiologically relevant regime corresponds 
to A = -1 + f, 6 < 1. Here, all global solutions A-C 
have a large "radius" of the curved segment R ~ — lne. 
The probability and noise entropy depend exponentially 
on R, so that P ~ V^e^/ 4 (e - 3e 2 lne) + ae 2 and 

H/ P ^ 1 — | + ^ me + fi e2 ■ The constants a and [3 de- 
pend on the solution type (A-C). Because (3a < fic < fiB, 
solutions A are optimal for small e. Near p ss 0.2, inter- 
sections between the three curves occur. In the 0(e 2 ) 
approximation, all of the three curves intersect at a sin- 
gle intersection point that splits into three once higher- 
order terms are included. As probability increases, B 



and C intersect first (A goes below), then A and B (the 
crossover point, C goes above), and finally, A and C (B 
goes below). The inset of Fig. [3] shows A-B and A-C in- 
tersections. Thus, solutions A and B are optimal at ex- 
treme and medium probabilities, respectively. Solutions 
of type C are never optimal, and neither are the straight 
line solutions, except for the middle point p = 1/2. 

In summary, we have presented a general approach to 
finding optimal binary separations of multidimensional 
inputs. In the small noise limit, the curvature of the op- 
timal bounding surface is determined locally by the prob- 
ability distribution. While Gaussian inputs are optimally 
separated by hyperplanes, this is not the case in general. 
For example, in the case of exponentially distributed in- 
puts in two dimensions, the optimal decision contours are 
curved and could either be closed or extended. Closed 
contours are optimal at extreme probabilities, while ex- 
tended ones are optimal for spike probabilities near 1/2. 
The ubiquity of non-Gaussian signals in nature, partic- 
ularly of the exponential distributions considered here, 
suggests that these results will be relevant for neurons 
across different sensory modalities. 
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V2\ 



causing contours to self-intersect before a 



closed smooth contour can be obtained. 
For the 2D exponential distribution, no curved solutions 
that extend to infinity and are confined to one or two 
quadrants may be better than the H = P solution achiev- 
able by straight lines parallel to the axes (111[) . This is due 
to the arc length \/l + (y'(x)) 2 in the noise entropy 



P 



dx e 



-x — y(x) 



(18) 
(19) 



which makes H > P for all such solutions. This argument 
does not apply to solution spanning three quadrants or 
four quadrants, shown Fig. [2] 
For solutions of type A, the arc length 



As/ 



An 



A + 2 + V2 - A 2 



V2- A 2 A + 2 - V2 - A 2 ' 
for — 1 < A < a/2, and 



2 fn _x A + 2 

isa = — , tan — , 



(20) 



(21) 



for A > y/2. The size of the curved segment is Ra = 
7r/4 — \Asa/2. The solutions are continuous at A = \/2. 
[42] For extended solutions of type B, the arc length 



Ash = - 



ln(l + AV2 - A 2 ) 



(22) 



solutions are valid only for — 1 < A < 0. The correspond- 
ing size of the curved segment is Rb = — arcsin(A/\/2) — 
AAss/2. 



